SSDR: An Algorithm for Clustering Categorical Data Using Rough Set Theory

نویسندگان

  • B. K. Tripathy
  • Adhir Ghosh
چکیده

In the present day scenario, there are large numbers of clustering algorithms available to group objects having similar characteristics. But the implementations of many of those algorithms are challenging when dealing with categorical data. While some of the algorithms available at present cannot handle categorical data the others are unable to handle uncertainty. Many of them have the stability problem and also have efficiency issues. This necessitated the development of some algorithms for clustering categorical data and which also deal with uncertainty. In 2007, an algorithm, termed MMR was proposed [3], which uses the rough set theory concepts to deal with the above problems in clustering categorical data. Later in 2009, this algorithm was further improved to develop the algorithm MMeR [2] and it could handle hybrid data. Again, very recently in 2011 MMeR is again improved to develop an algorithm called SDR [22], which can also handle hybrid data. The last two algorithms can handle both uncertainties as well as deal with categorical data at the same time but SDR has more efficiency over MMeR and MMR. In this paper, we propose a new algorithm in this sequence, which is better than all its predecessors; MMR, MMeR and SDR, and we call it SSDR (Standard deviation of Standard Deviation Roughness) algorithm. This takes both the numerical and categorical data simultaneously besides taking care of uncertainty. Also, this algorithm gives better performance while tested on well known datasets. KeywordsClustering, MMeR, MMR, SDR, SSDR, uncertainty. ______________________________________________________________________________

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MMeMeR: An Algorithm for Clustering Heterogeneous Data using Rough Set Theory

In recent times enumerable number of clustering algorithms have been developed whose main function is to make sets of objects having almost the same features. But due to the presence of categorical data values, these algorithms face a challenge in their implementation. Also some algorithms which are able to take care of categorical data are not able to process uncertainty in the values and so h...

متن کامل

A Rough Set Approach for Customer Segmentation

Customer segmentation is a process that divides a business’s total customers into groups according to their diversity of purchasing behavior and characteristics. The data mining clustering technique can be used to accomplish this customer segmentation. This technique clusters the customers in such a way that the customers in one group behave similarly when compared to the customers in other gro...

متن کامل

MMR: An algorithm for clustering categorical data using Rough Set Theory

A variety of cluster analysis techniques exist to group objects having similar characteristics. However, the implementation of many of these techniques is challenging due to the fact that much of the data contained in today’s databases is categorical in nature. While there have been recent advances in algorithms for clustering categorical data, some are unable to handle uncertainty in the clust...

متن کامل

Hierarchical clustering algorithm for categorical data using a probabilistic rough set model

Several clustering analysis techniques for categorical data exist to divide similar objects into groups. Some are able to handle uncertainty in the clustering process, whereas others have stability issues. In this paper, we propose a new technique called TMDP (Total Mean Distribution Precision) for selecting the partitioning attribute based on probabilistic rough set theory. On the basis of thi...

متن کامل

Rough K-modes Clustering Algorithm Based on Entropy

Cluster analysis is an important technique used in data mining. Categorical data clustering has received a great deal of attention in recent years. Some existing algorithms for clustering categorical data do not consider the importance of attributes for clustering, thereby reducing the efficiency of clustering analysis and limiting its application. In this paper, we propose a novel rough k-mode...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011